19 research outputs found

    Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease

    Full text link
    From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on genetic variations at the DNA base pair level, called Single-Nucleotide Polymorphisms (SNPs), collected from the Ontario Heart Genomics Study (OHGS). First, the thesis explains two commonly used supervised learning algorithms, the k-Nearest Neighbour (k-NN) and Random Forest classifiers, and includes a complete proof that the k-NN classifier is universally consistent in any finite dimensional normed vector space. Second, the thesis introduces two dimensionality reduction steps, Random Projections, a known feature extraction technique based on the Johnson-Lindenstrauss lemma, and a new method termed Mass Transportation Distance (MTD) Feature Selection for discrete domains. Then, this thesis compares the performance of Random Projections with the k-NN classifier against MTD Feature Selection and Random Forest, for predicting artery disease based on accuracy, the F-Measure, and area under the Receiver Operating Characteristic (ROC) curve. The comparative results demonstrate that MTD Feature Selection with Random Forest is vastly superior to Random Projections and k-NN. The Random Forest classifier is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS genetic dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.Comment: This is a Master of Science in Mathematics thesis under the supervision of Dr. Vladimir Pestov and Dr. George Wells submitted on January 31, 2014 at the University of Ottawa; 102 pages and 15 figure

    Swarm Differential Privacy for Purpose Driven Data-Information-Knowledge-Wisdom Architecture

    Get PDF
    Privacy protection has recently been in the spotlight of attention to both academia and industry. Society protects individual data privacy through complex legal frameworks. The increasing number of applications of data science and artificial intelligence has resulted in a higher demand for the ubiquitous application of the data. The privacy protection of the broad Data-Information-Knowledge-Wisdom (DIKW) landscape, the next generation of information organization, has taken a secondary role. In this paper, we will explore DIKW architecture through the applications of the popular swarm intelligence and differential privacy. As differential privacy proved to be an effective data privacy approach, we will look at it from a DIKW domain perspective. Swarm Intelligence can effectively optimize and reduce the number of items in DIKW used in differential privacy, thus accelerating both the effectiveness and the efficiency of differential privacy for crossing multiple modals of conceptual DIKW. The proposed approach is demonstrated through the application of personalized data that is based on the open-sourse IRIS dataset. This experiment demonstrates the efficiency of Swarm Intelligence in reducing computing complexity

    Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

    Full text link
    We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT and XLM, three new cross-lingual pre-training tasks are proposed, including cross-lingual word recovery, cross-lingual paraphrase classification and cross-lingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives. We also find that doing fine-tuning on multiple languages together can bring further improvement. Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline. On XNLI, 1.8% averaged accuracy improvement (on 15 languages) is obtained. On XQA, which is a new cross-lingual dataset built by us, 5.5% averaged accuracy improvement (on French and German) is obtained.Comment: Accepted to EMNLP2019; 10 pages, 2 figure

    Rehabilitation treatment of multiple sclerosis

    Get PDF
    Multiple sclerosis is a slowly progressive disease, immunosuppressants and other drugs can delay the progression and progression of the disease, but the most patients will be left with varying degrees of neurological deficit symptoms, such as muscle weakness, muscle spasm, ataxia, sensory impairment, dysphagia, cognitive dysfunction, psychological disorders, etc. From the early stage of the disease to the stage of disease progression, professional rehabilitation treatment can reduce the functional dysfunction of multiple sclerosis patients, improve neurological function, and reduce family and social burdens. With the development of various new rehabilitation technologies such as transcranial magnetic stimulation, virtual reality technology, robot-assisted gait, telerehabilitation and transcranial direct current stimulation, the advantages of rehabilitation therapy in multiple sclerosis treatment have been further established, and more treatment means have also been provided for patients

    CRAFTS for Fast Radio Bursts : extending the dispersion-fluence relation with new FRBs detected by FAST

    Get PDF
    We report three new FRBs discovered by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), namely FRB 181017.J0036+11, FRB 181118, and FRB 181130, through the Commensal Radio Astronomy FAST Survey (CRAFTS). Together with FRB 181123, which was reported earlier, all four FAST-discovered FRBs share the same characteristics of low fluence (1000 pc cm(-3)), consistent with the anticorrelation between DM and fluence of the entire FRB population. FRB 181118 and FRB 181130 exhibit band-limited features. FRB 181130 is prominently scattered (tau(s) 8 ms) at 1.25 GHz. FRB 181017.J0036+11 has full-bandwidth emission with a fluence of 0.042 Jy ms, which is one of the faintest FRB sources detected so far. CRAFTS has started to build a new sample of FRBs that fills the region for more distant and fainter FRBs in the fluence-DME diagram, previously out of reach of other surveys. The implied all-sky event rate of FRBs is 1.24(-0.90)(+1.94) x 5 sky(-1) day(-1) at the 95% confidence interval above 0.0146 Jy ms. We also demonstrate here that the probability density function of CRAFTS FRB detections is sensitive to the assumed intrinsic FRB luminosity function and cosmological evolution, which may be further constrained with more discoveries

    CRAFTS for Fast Radio Bursts Extending the dispersion-fluence relation with new FRBs detected by FAST

    Get PDF
    We report three new FRBs discovered by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), namely FRB 181017.J0036+11, FRB 181118 and FRB 181130, through the Commensal Radio Astronomy FAST Survey (CRAFTS). Together with FRB 181123 that was reported earlier, all four FAST-discovered FRBs share the same characteristics of low fluence (≤\leq0.2 Jy ms) and high dispersion measure (DM, >1000>1000 \dmu), consistent with the anti-correlation between DM and fluence of the entire FRB population. FRB 181118 and FRB 181130 exhibit band-limited features. FRB 181130 is prominently scattered (τs≃8\tau_s\simeq8 ms) at 1.25 GHz. FRB 181017.J0036+11 has full-bandwidth emission with a fluence of 0.042 Jy ms, which is one of the faintest FRB sources detected so far. CRAFTS starts to built a new sample of FRBs that fills the region for more distant and fainter FRBs in the fluence-DME\rm DM_E diagram, previously out of reach of other surveys. The implied all sky event rate of FRBs is 1.24−0.90+1.94×1051.24^{+1.94}_{-0.90} \times 10^5 sky−1^{-1} day−1^{-1} at the 95%95\% confidence interval above 0.0146 Jy ms. We also demonstrate here that the probability density function of CRAFTS FRB detections is sensitive to the assumed intrinsic FRB luminosity function and cosmological evolution, which may be further constrained with more discoveries.Comment: 9 Pages, 4 Plots and 1 Table. The Astrophysical Journal Letter Accepte
    corecore